Automatic Resolution of Ambiguous Abbreviations in Biomedical Texts using Support Vector Machines and One Sense Per Discourse Hypothesis

نویسندگان

  • Zhonghua YU
  • Yoshimasa TSURUOKA
  • Jun’ichi TSUJII
چکیده

We present an algorithm to disambiguate abbreviations in Medline abstracts using Support Vector Machines (SVM) and one sense per discourse hypothesis. In contrast to other work using SVM for natural language disambiguation which always depend on handcrafted training and testing data, the algorithm provided here automatically extracts the training and testing data through searching long form of abbreviation in the texts and using one sense per discourse hypothesis. In the phase of testing, we also use this hypothesis to unify the outputs of the classifier via majority voting. The results obtained in our experiments demonstrate that SVM is a promising technique for abbreviation disambiguation and using majority voting in the phase of testing can improve the accuracy from 82.35% to 84.31%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Resolving abbreviations to their senses in Medline

MOTIVATION Biological literature contains many abbreviations with one particular sense in each document. However, most abbreviations do not have a unique sense across the literature. Furthermore, many documents do not contain the long forms of the abbreviations. Resolving an abbreviation in a document consists of retrieving its sense in use. Abbreviation resolution improves accuracy of document...

متن کامل

Abbreviation Disambiguation: Experiments with Various Variants of the One Sense per Discourse Hypothesis

Abbreviations are widely used in many languages and disambiguation of abbreviations is critical. In this research, a structured process that attempts to solve the problem of abbreviation ambiguity is presented. Various baseline methods have been explored, including context-related methods and statistical methods. Almost all methods are domain-independent and language independent. The applicatio...

متن کامل

Translation of Acronyms, Initialisms and Abbreviations (AIA) in Persian Political and Sport Journalistic Texts

The different writing systems of English and Persian makes translation of acronyms, initialisms and abbreviations challenging. This study aimed at finding which strategies were applied most frequently in translating acronyms, initialisms and abbreviations from English to Persian especially in journalistic texts. The study was done based n Descriptive Translation Study of Toury and strategies pr...

متن کامل

Automatic Interpretation of UltraCam Imagery by Combination of Support Vector Machine and Knowledge-based Systems

With the development of digital sensors, an increasing number of high-resolution images are available. Interpretation of these images is not possible manually, which necessitates seeking for practical, fast and automatic solutions to solve the environmental and location-based management problems. The land cover classification using high-resolution imagery is a difficult process because of the c...

متن کامل

An Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification

Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003